Simple, readable sub-sentences

نویسندگان

  • Sigrid Klerke
  • Anders Søgaard
چکیده

We present experiments using a new unsupervised approach to automatic text simplification, which builds on sampling and ranking via a loss function informed by readability research. The main idea is that a loss function can distinguish good simplification candidates among randomly sampled sub-sentences of the input sentence. Our approach is rated as equally grammatical and beginner reader appropriate as a supervised SMT-based baseline system by native speakers, but our setup performs more radical changes that better resembles the variation observed in human generated simplifications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sub-Sentential Alignment Method by Analogy

This paper describes a method for searching word correspondences between pairs of translation sentences. In the Example-Based Machine Translation, translation patterns can be extracted easily if word correspondences between pair of translation sentences are defined. The popular methods for aligning bilingual corpus at a sub-sentential level are unable to produce reliable result when the size of...

متن کامل

Discourse for Machine Translation

Statistical Machine Translation is a modern success: Given a source language sentence, SMT finds the most probable target language sentence, based on (1) properties of the source; (2) probabilistic source--target mappings at the level of words, phrases and/or sub-structures; and (3) properties of the target language. SMT translates individual sentences because the search space even for a single...

متن کامل

A Random, Semantically Appropriate Sentence Generator for Speaker Verification

We describe two systems for automatically generating English sentences, and evaluate the suitability of their output for speaker verification. The first system, SUSGen, generates grammatical but semantically anomalous sentences of controlled length, vocabulary and phonetic content. The second system, SASGen, extends SUSGen to generate a greater variety of sentences and ones that are, for the mo...

متن کامل

Learning to Explain Entity Relationships in Knowledge Graphs

We study the problem of explaining relationships between pairs of knowledge graph entities with human-readable descriptions. Our method extracts and enriches sentences that refer to an entity pair from a corpus and ranks the sentences according to how well they describe the relationship between the entities. We model this task as a learning to rank problem for sentences and employ a rich set of...

متن کامل

Sense Tagging In Action Combining Different Tests With Additive Weighangs

This paper describes a working sense tagger, which attempts to automatically link each word in a text corpus to its corresponding sense in a machinereadable dictionary. It uses information automatically extracted from the MRD to find matches between the dictionary and the Corpus sentences, and combines different types of information by simple additive scores with manually set weightings.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013